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We consider the problem of constructing robust nonparametric 
confidence intervals and tests of hypothesis for the median when the 
data distribution is unknown and the data may contain a small frac¬ 
tion of contamination. We propose a modification of the sign test 
(and its associated confidence interval) which attains the nominal 
significance level (probability coverage) for any distribution in the 
contamination neighborhood of a continuous distribution. We also 
define some measures of robustness and efficiency under contamina¬ 
tion for confidence intervals and tests. These measures are computed 
for the proposed procedures. 

1. Introduction. Often, a fraction of the data is contaminated by out¬ 
liers and other type of low quality observations. For example, a slight shift in 
one of several similar instruments used in an experiment may cause a small 
but consistent bias in a few observations. We are often interested in drawing 
inference from the uncontaminated part of the data, which distribution we 
call the ‘^target distribution.'" It is well known that robust point estimates 
successfully limit the effect of a small fraction of contamination in the data. 
Unfortunately, naive “robust” confidence intervals constructed around ro¬ 
bust point estimates are not that successful. See Fraiman, Yohai and Zamar 
( 2001 ). 

To allow for a fraction e of contamination in the data we assume that 
the actual distribution G belongs to the contamination neighborhood of the 
target distribution U, 

(1.1) T,{F) = {G-.G = {l-e)F + eH}, 
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where H is arbitrary and 0 < e < 1/2. 

Robust inference (beyond point estimation) means that the inference pro¬ 
cedure achieves its intended goal over the entire contamination neighbor¬ 
hood. For instance, robust confidence intervals must achieve the nominal 
coverage probability of the target parameter for all the distributions in a 
contamination neighborhood. Similarly, the rejection probability of robust 
tests when the null hypothesis is true must be smaller than or equal to the 
nominal significance level under all the distributions in the neighborhood. 

Robust tests and confidence intervals have been proposed and studied 
by several researchers. Huber (1965) introduced censored likelihood ratio 
tests to robustify the Neyman-Pearson optimal test. Huber (1968) consid¬ 
ered robust confidence intervals for a location parameter 6 which cover 
the true parameter with the nominal probability for all distributions in 
a neighborhood of the target distribution. The intervals are of the form 
{Tn — a, Tn + a), where is a location estimate. He found the estimate 
Tn that minimizes a subject to the conditions P(T„ < 6 — a) < aI2 and 
P{Tn > 6 a) < a/2 —instead of the more natural but less tractable con¬ 
dition P{Tn <9 — a) + P{Tn > 9 + a) < a —for finite samples coming from 
distributions in the contamination neighborhood. The optimal estimate is an 
M-estimate with Huber type score function. In Huber’s approach the scale 
parameter is assumed known. Fraiman, Yohai and Zamar (2001) solved a 
related problem: find robust intervals (T —a,T-|-a) of minimum length 
and asymptotically correct coverage for all distributions in a contamination 
neighborhood. 

We now briefly discuss two asymptotic approaches to the problem of ro¬ 
bust inference for the case of small e. The first, introduced by Huber-Carol 
(1970), Rieder (1978) and Bednarski (1982), uses shrinking contamination 
neighborhoods (contamination fraction of order n“^/^) for the null hypoth¬ 
esis and contiguous alternatives of order The second, introduced by 

Rousseeuw and Ronchetti (1981), is based on the influence function for tests 
which is used to approximate the maximum level and the minimum power 
of a test in a contamination neighborhood of size e, when e is small. In 
particular, the approximation of the maximum level can be used to cor¬ 
rect the test so that the maximum level is not larger than a given value a 
for all distributions in a contamination neighborhood. For a full account of 
this approach see Hampel et al. (1986) and Markatou and Ronchetti (1997). 
A related approach was given by Lambert (1981) who defines an influence 
function that measures the effect of the contamination on the p-value of a 
test. 

Morgenthaler (1986) considers a class of robust confidence intervals, called 
strong confidence intervals, which keep the nominal coverage probability con¬ 
ditional on the sample configuration, under two or more specified symmetric 
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distributions. It would seem reasonable to expect that by choosing some ex¬ 
treme symmetric distributions (e.g., the normal and slash distributions), 
the coverage of the interval should remain correct for other “intermediate” 
symmetric distributions. Morgenthaler also considers a class of robust confi¬ 
dence intervals, called bioptimal, which are robust in terms of efficiency for 
two symmetric distributions. The case of asymmetric contamination is not 
considered in Morgenthaler’s approach. 

Rieder (1982) addresses the problem of robustifying rank tests preserving 
their nonparametric nature. He considers one-sided tests for one and two 
sample problems, showing that the least favorable distribution under a given 
fraction of contamination does not depend on the target model. Our two- 
sided modified sign test and the corresponding robust confidence interval 
can be considered extensions of Rieder’s approach. 

The rest of the paper is organized as follows. Section 1.1 briefly reviews 
nonparametric intervals obtained by inverting the sign test. Section 1.2 con¬ 
tains our main result, Theorem 1, which shows that sign-test intervals are 
not robust and paves the way for the construction of robust nonparametric 
intervals for the median in Section 2. In this section we also discuss cov¬ 
erage robustness of confidence intervals and the associated concept of level 
robustness of a test. In Section 3 we address the concept of length robust¬ 
ness of a conhdence interval and the associated concept power robustness 
of a test. In this section we show that the nonparametric robust conhdence 
interval dehned in Section 2 has optimal length. In Section 4 we discuss pos¬ 
sible extensions and further research. The last section is the Appendix with 
some proofs. Detailed proofs of our results can be found in Yohai and Zamar 
(2004). 

1.1. Robust nonparametric inference for the median. Let 

2^(1) ^ ^(2) ^ ^ ^(n) 

be the order statistics of a sample Xn = (xi,..., x„) with common distribu¬ 
tion F satisfying the following assumption. 

(Al) F is continuous with a unique median 0{F) = F~^{l/2). 

Consider the null hypothesis Hq:6 = 6o and the sign test statistic 

n 

(1.2) r„,0(A„)=^/(xi-0>O). 

i=l 


The interval 
(1.3) 


[^(fc+1); /c)) 
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is obtained by inverting the acceptance region k < Tn^o^{Xn) < n — k. See, 
for instance, Hettmansperger (1984). The interval (1.3) is a distribution-free 
(1 — Q!(A;))100% conhdence interval for 6, where 

(1.4) a{k) = 2P{Zn < k), ~ Binomial(n, 1/2). 

For simplicity, we will only consider levels in the set {q;(/c)}. A: = 1,2,..., [n/2]. 
Hettmansperger and Sheather (1986) show how general levels can be ob¬ 
tained by interpolating between the order statistics. 

Interval (1.3) yields valid inference for the median of the contaminated 
distribution, but not for the median of the target distribution. In general, 
distribution-free methods do not yield valid inference for the target distri¬ 
bution in the presence of asymmetric contamination. Since the median is a 
very robust location parameter, 0{G) and 0{F) are generally close for all 
G in P^{F). Still, as shown by Table 1 computed using the result of The¬ 
orem 1, the probability that (1.3) covers the target median 6{F )—and the 
significance level of the associated sign test—may be severely upset. 


1.2. Our main result. Theorem 1 shows that the nonparametric interval 
(1.3) is not robust because its probability of covering the median of F can be 
much smaller than 1 — a{k) for distributions G in T^{F). More importantly, 
it gives a simple way to modify the definition of this interval (see Section 2.2) 
so that it remains nonparametric and achieves robustness. 


Theorem 1. Let Xn = (xi, ..., Xn) he a random sample from G € Xe{F) 
with F satisfying (Al). Then, 


(a) 

(1.5) 


<e< X(,_fc)) = 1 


— a*(n, k, e). 


Table 1 

Minimum coverage probability for contaminated samples 






e 






1 — a 

« 0.95 



1 — a . 

w 0.90 


n 

0 

0.05 

0.10 

0.15 

0 

0.05 

0.10 

0.15 

20 

0.959 

0.954 

0.938 

0.912 

0.885 

0.876 

0.849 

0.804 

40 

0.962 

0.952 

0.922 

0.868 

0.919 

0.904 

0.859 

0.784 

100 

0.943 

0.912 

0.815 

0.655 

0.911 

0.872 

0.755 

0.578 

200 

0.944 

0.881 

0.689 

0.414 

0.896 

0.811 

0.582 

0.307 

500 

0.946 

0.789 

0.376 

0.074 

0.902 

0.702 

0.279 

0.043 

1000 

0.946 

0.636 

0.108 

0.002 

0.906 

0.537 

0.068 

0.001 

2000 

0.948 

0.385 

0.006 

0 

0.897 

0.273 

0.002 

0 
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where 

(1.6) a*{n,k,£) = 1 — P{k < Zn <n — k), 

with Zn distributed as Binomialjn, (1 — e)/2}. 

(b) The infimum in (1.5) is achieved for any contaminating distribution 
which places all its mass to the right or left of 6. 

Using Theorem 1, we calculate the minimum coverage probability for the 
intervals (1.3) for several values of n, a and e. The results shown in Table 
1 are disappointingly low, especially for large n. The minimum coverages 
are not overly pessimistic since they are caused by any contamination fully 
supported to the right (or left) of the target median. 

2. Coverage and level robustness. 

2.1. Coverage robustness of a confidence interval. In connection with 
the preceding discussion, we now formally state the desired robustness and 
nonparametric properties for the coverage probability of confidence intervals. 

Definition II (Coverage robustness). We say that a confidence interval 
In = [an{Xn),bn{Xn)) has E-robust Coverage 1 — a at F if 

(2.1) inf PcianiXn) <0 <bniXn)} = 1 - a. 

GeTeiF) 

A related concept of robust confidence interval was introduced by Huber 
(1968). Although Huber’s objective function is not exactly equal to the min¬ 
imum coverage probability, it is closely related to it. The following definition 
seems natural to convey the nonparametric nature of an interval. 

Definition 12 (Nonparametric coverage robustness). We say that a 
confidence interval In = [on(X„), bn{Xn)) has nonparametric E-robust cov¬ 
erage 1 — a if it has E-robust level 1 — a at F for all F satisfying (Al). 

2.2. An exact nonparametric e-robust interval for 9. We wish to con¬ 
struct robust and nonparametric confidence intervals for the median of the 
target distribution. Theorem 1 derives the exact finite sample least favorable 
distribution (under contamination neighborhoods) for (1.3) and shows that 
this distribution does not depend on the target distribution F. This theorem 
also tells us how to modify the interval (1.3) so that it attains nonparametric 
E-robust level 1 — a. Namely, the integer k must satisfy the equation 

( 2 . 2 ) 


a*(n, k, e) = a . 
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Note that the definition (2.2) of k is based on the distribution Binomial{n, (1 — 
e)/2} instead of the Binomial(n, 1/2). As in the classical case, it is not pos¬ 
sible to achieve all the desired exact coverage probabilities 1 — a. For sim¬ 
plicity, we restrict attention to integers 

(2.3) kn = kn{n, a) = argmin \a*{n, k, e) — a|, 

which clearly satisfies 

lim a*(n, kn, e) = a. 

n^oo 

In summary, the modihed interval covers the median of the target distri¬ 
bution with a guaranteed confidence level for each n and for all the distri¬ 
butions in a contamination neighborhood of a general target distribution. 

2.3. Level robustness of a test. Given the well-known duality between 
confidence interval and tests, it is natural to expect that the nonparametric 
robust confidence intervals introduced in the previous section will automat¬ 
ically yield nonparametric tests with good robustness properties. 

Following Huber (1965), we next define the concept of e-robust level-a 
test. 


Definition T1 (Level robustness). Let F be a fixed distribution satis¬ 
fying (Al) with 6 = 6 q. a nonrandomized test (po^ has e-robust level a (for 
Hq versus Hi) at F if 

sup PcWeoiXn) = !} = «. 

G£:f,{f) 

This property ensures the validity of the test over the entire neighbor¬ 
hood ^^(F). That is, the probability of rejecting Hq is less than or equal 
to a not only at F, but also at any G in Fe(F). 


Definition T2 (Nonparametric level robustness). We say that a non¬ 
randomized test (p$Q has nonparametric e-robust level a (for Hq versus Hi) 
if pqq has e-robust level a at F for all F satisfying (Al) with 9 = 6 q. 


2.4. An exact nonparametric e-robust test. It is immediate that T1 (T2) 
holds for a family of tests if and only if II (12) holds for the associated 
sequence of intervals. In particular, the e-robust sign test ipe^ of level a can 
be derived from the nonparametric e-robust interval Ia{Xn) as follows: 

, ri, if0o^4(^n), 

^ \0, if0oG/a(^n), 

and, therefore. 



(2.4) 

where 


'Ldo (Xn) — 


1, if Tnpo{Xn)<k or Tn^ 0 g{Xn)>n-k, 
0, if A: < Tn^e^ (Xn) <n-k, 


Tnfi{Xn) is given by (1.2) and a*{n, k,e) = a. 
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2.5. Contamination tolerance of a test. In some cases a test may be 
significant due to the presence of a small fraction of contamination in the 
data. To what extent might this be the case in a given application? The 
significance of the test would deliver a stronger message if we could discard 
the possibility that the results are due to contamination in the data. This 
motivates the following definition. 

Definition T3 (Contamination tolerance). Consider a family of tests 
(pBo,e for Hq:6 = 6q versus Hi:9 ^ 0 < e < 0.5, such that (i) is e- 

robust of level a and (ii) £i < £2 implies Given a 

sample such that (p 0 Qfi{Xn) = 1 , the contamination tolerance for signifi¬ 
cance level a at Xn [denoted by Tq, = Ta{Xn)] is defined as 

Ta(Xn) =sup{e:(/?eo,e(X„) = 1}. 

In other words, the contamination tolerance for significance level a is the 
maximum level of contamination e such that the e-robust test of level a 
still rejects the null hypothesis. Therefore, if we believe that the fraction of 
contamination in the data is smaller than Tq, it is safe to reject the null 
hypothesis, even if we do not know the exact contamination size. Conse¬ 
quently, a large Tq, (with small a) can be taken as strong evidence against 
the null hypothesis. 

Consider now the family of e-robust sign tests given by (2.4). Then the 
value of Ta{Xn) satisfies the equation 

(2.5) a*{n,rniXn),Ta} = a, 

where r„(X„) = min{r„^ep(X„),n — r„^ep(X„)}. Notice that equation (2.5) 
has a solution if and only if a*{n,r„(X„), 0} < a, that is, if and only if 
the null hypothesis is rejected under the assumption of a zero fraction of 
contamination (perfect data). If this condition is not satished, we would not 
reject Hq even if the classical sign test is used. 

3. Length and power robnstness. Definitions II and 12 guarantee the 
correct coverage level of the interval. However, robust confidence intervals 
should not only have correct level but also remain informative under con¬ 
tamination. Definition 13 formalizes this robustness requirement in terms of 
the concept of maximum asymptotic length of the interval introduced below. 

For the following discussion we must distinguish between the design con¬ 
tamination size £ used to construct the confidence interval (so that it satisfies 
Definition II) and the real contamination size denoted by 5. 

Given a sequence of intervals In = [an{Xn),bn{Xn)), we consider the max¬ 
imum asymptotic length under contamination of size 6 at F, 

(3.1) L{In,F,5}= sup essuplimsup(6n(-^n) - an(-^n)), 

GGX^siF) n 
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where essup stand for essential supremum. The essup is applied for greater 
generality; however, in all cases we are aware of (including the interval based 
on the revised sign test), limsup„(6n(X„) — an{Xn)) is a constant (finite or 
infinite) and, therefore, essup is not necessary. Notice that if the interval 
length is location invariant, so is the above definition. 

The intuitive notion of remaining “informative under contamination of 
size (5” is captured by the following definition. Notice that our definition of 
length breakdown point is the confidence interval counterpart of Hampel’s 
(1971) breakdown point of a point estimate. 

Definition 13 (Length robustness). We say that the sequence of inter¬ 
vals In = [an{Xn), bn{Xn)), u > uq, has (5-robust length at F if L{/„, F, <5} < 
oo. The corresponding length breakdown point at F is given by 

6*{In, F} = sup{,5: L{In, F, < oo}. 

The next theorem establishes the asymptotic length-robustness of the 
modified sign test interval. 


Theorem 2. Suppose that F is continuous and has a symmetric (around 
6 ) and unimodal density. Let 0 < a < 1 and 0<e<l/2 he fixed and con¬ 
sider the sequence of intervals In = with kn given by (2.3). 

Then: 


1. For 0 < (5 < (1 — e)/2. 


F-i 


2(1 - <5) J 


2. 5*{[In),F} = {l-e)/2. 

3. The sequence of intervals In has s-robust length if and only if e < 1/3. 

4. Let In = [An{Xn), Bn{Xn)) be a sequence of confidence intervals such that 


inf PG{An{Xn) < G^\1/2) < Bn{Xn)} = 1 - « 
e(Go) 


for any continuous distribution Gq. Suppose that lim^^oo^n(-^n) = Aq, 
and lim„^oo F„(X„) = Bq almost surely when the sample comes from F. 
Then Bq > F“^((l -|- e)/2) and Aq < F“^((l — e)/2). 


As one may have expected, the maximum asymptotic length of the sign- 
test-based intervals depends on the design and actual fractions of contamina¬ 
tion, e and 5. Finite maximum lengths are obtained provided <5 < (1 — e)/2. 
Therefore, length-breakdown point occurs when 5 = (1 — e)/2. Since the 
length-breakdown point 5* = {1 — e)/2 is a decreasing function of e, there 
is a trade-off between the coverage-robustness and the length-robustness of 
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Table 2 

Coverage probability (CP) and expected length (EL) for robust eonfidence 
interval with approximate 95 % coverage probability 


n 

e = 

= 0 

e 

= 0.05 

e 

= 0.10 

CP 

ELU 

CP 

ELU 

ELC 

CP 

ELU 

ELC 

20 

0.959 

1.22 

0.954 

1.22 

1.3 

0.938 

1.24 

2.52 

40 

0.962 

0.84 

0.952 

0.83 

0.89 

0.960 

0.97 

1.13 

60 

0.948 

0.64 

0.961 

0.72 

0.76 

0.955 

0.81 

0.92 

80 

0.943 

0.54 

0.949 

0.60 

0.64 

0.955 

0.73 

0.84 

100 

0.943 

0.48 

0.941 

0.53 

0.56 

0.957 

0.69 

0.78 

200 

0.944 

0.34 

0.947 

0.42 

0.44 

0.949 

0.55 

0.61 

500 

0.946 

0.22 

0.947 

0.31 

0.32 

0.952 

0.44 

0.50 

1000 

0.946 

0.15 

0.947 

0.25 

0.27 

0.948 

0.38 

0.43 

2000 

0.948 

0.11 

0.949 

0.22 

0.23 

0.950 

0.34 

0.39 


the sign-test-based intervals. This naturally sets an upper bound of 1/3 on 
the possible choices of design-contamination fractions in practice. Part 4 
shows that in the case of uncontaminated data (i.e., (5 = 0), our interval is 
efficient in that it has the smallest possible asymptotic length among all 
nonparametric e-robust confidence intervals for the median, which upper 
and lower limits converge. Notice that convergence of the interval limits is 
a weak assumption satisfied by all known confidence intervals. 

3.1. Numerical results. We wrote a simple S-PLUS function, available 
on-line at http://hajek.stat.ubc.ca/~ruben/codel, which for a given sample 
Xn, significance level a, and design contamination fraction e, reports the 
integer kn, the robust interval [x(^kri+i)^^{n-kn)) its exact minimum cov¬ 
erage probability, 1 — a* (n, kn,e)- Using this function, we carried out a Monte 
Carlo simulation study to determine the increase in expected length for the 
robust nonparametric intervals with kn given by (2.3). 

We consider two approximate coverage probabilities, 95% (Table 2) and 
90% (Table 3) and three contamination levels e = 0, 0.05 and 0.10. The case 
e = 0 corresponds to confidence intervals based on the classical sign-test. 
The tables display the exact infimum coverage probabilities (CP) and aver¬ 
age lengths (EL). The average lengths of the robust confidence intervals are 
computed under two scenarios: uncontaminated (ELU) and contaminated 
samples (ELC). In the latter case, the fraction of contamination ((5) equals 
the design contamination (e). The contamination is placed at the least fa¬ 
vorable location, which, as shown in the proof of Theorem 2, corresponds 
to H = 6y in (1.1) with y ±cxd. Naturally, the percent increase in average 
length is larger for larger samples sizes, when the effect of sampling variabil¬ 
ity is overcome by the effect of contamination bias. The average lengths are 
computed using 8000 replications. 
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In Table 4 we compare the asymptotic length of the nonparametric robust 
confidence intervals with the limiting length of the asymptotic parametric ro¬ 
bust confidence intervals proposed by Huber (1968) and Praiman, Yohai and Zamar 
(2001). The latter were proposed for a contamination neighborhood of the 
normal distribution and have limiting length equal to 2‘h“^[l/{2(l — e)}], 
which is twice the maximum asymptotic bias of the median over the contam¬ 
ination neighborhood. We calculated the limiting lengths for both proposals 
under the normal model and under the least favorable contaminating distri¬ 
bution in Te{^)- 

Notice that under Standard Normal, the nonparametric robust intervals 
have smaller expected length for all the considered values of e. The expected 
lengths are practically equal for the least favorable contamination with a 
small advantage for the parametric interval. 

3.2. Power robustness of a test. As in the case of confidence intervals, 
we must distinguish between the design contamination e used to construct 
the test and the actual contamination 5. The following definition formalizes 
the concept of robust power behavior of a test under contamination of size 5. 


Table 3 

Coverage probability (CP) and expected length (EL) for robust eonfidence 
interval with approximate 90 % coverage probability 


n 

e = 

: 0 

e 

= 0.05 

e 

= 0.10 

CP 

ELU 

CP 

ELU 

ELC 

CP 

ELU 

ELC 

20 

0.885 

0.89 

0.876 

0.90 

0.96 

0.938 

1.20 

2.40 

40 

0.919 

0.70 

0.904 

0.70 

0.74 

0.922 

0.83 

0.95 

60 

0.908 

0.55 

0.883 

0.55 

0.58 

0.923 

0.72 

0.82 

80 

0.907 

0.47 

0.918 

0.54 

0.57 

0.891 

0.60 

0.68 

100 

0.911 

0.43 

0.912 

0.48 

0.51 

0.904 

0.58 

0.66 

200 

0.896 

0.29 

0.908 

0.36 

0.39 

0.912 

0.49 

0.56 

500 

0.902 

0.19 

0.895 

0.27 

0.28 

0.904 

0.40 

0.45 

1000 

0.906 

0.13 

0.903 

0.23 

0.24 

0.904 

0.36 

0.40 

2000 

0.897 

0.09 

0.899 

0.20 

0.21 

0.900 

0.32 

0.36 


Table 4 

Expected length of parametric (P) and nonparametric (NP) robust intervals 



e = 

0.05 

e = 

0.10 

e = 0.15 

e = 

0.20 

Distribution 

p 

NP 

p 

NP 

P 

NP 

p 

NP 

Standard Normal 
Least Favorable 

0.132 

0.132 

0.125 

0.132 

0.279 

0.279 

0.251 

0.282 

0.446 

0.446 

0.378 

0.458 

0.637 

0.637 

0.507 

0.674 
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Definition T4 (Power robustness). Let F be a fixed distribution sat¬ 
isfying (Al) with 0 = 9q and let F\{x) = F{x — A). We say that a sequence 
of nonrandomized tests n > no, has (5-robust power (for Hq versus 

Hi) at F if there exists K such 

(3.2) inf lim = 1} = 1 for all |A| > K. 

This property ensures the consistency of the sequence of nonrandomized 
tests {<fn,eo}^ uniformly over the neighborhood F^{Fx), provided X = 6 — Oq 
is large enough. Definition T4 suggests the following measure of asymptotic 
power robustness of the sequence {i^nfioiXn)} of tests, under contamination 
of size 6 . 

Definition T5 (Power distance). Let F be a fixed distribution satisfy¬ 
ing (Al) with 6 = 9q. The 5-consistency distance of a sequence of tests </?n, 0 o, 
n>nQ, at F denoted by K{ipn,eo ^F, 6 } is the infimum of the set of values K 
for which (3.2) holds. 

The concept of breakdown point of a test was first consider by Ylvisaker 
(1977) and Rieder (1982). The latter defined and computed the breakdown 
point of rank and M-tests. Our Definition T5 is closely related to the concept 
of power breakdown point of a test introduced by He, Simpson and Portnoy 
(1990). In fact, for a given 9 ^ 9q, the power breakdown point at 9 is the 
value of 6 such that \9 — 0o| = F{{ipn^eo)jF, 6 }. 

Next we dehne a new concept of breakdown point for a test which does 
not depend on a particular value of 9 and is directly associated with the def¬ 
inition of length breakdown point of a conhdence interval given in Section 3. 

Definition T6 (Power breakdown). Let F be a fixed distribution sat¬ 
isfying 9q = F“^(l/2). The power breakdown point 5* of the sequence of 
nonrandomized tests </?n, 0 o, n > no, at F is the supremum of the set of val¬ 
ues 5 for which the sequence of tests is 5-robust. 

The power-robustness properties of the robustified sign test given by (2.4) 
are established in the next theorem. They are closely related to the length- 
robustness properties of the confidence intervals established in Theorem 2. 

Theorem 3. Let 0 < a < 1 and 0 < e < 1/2 be fixed and consider the se¬ 
quence of tests n > no, for Hq:9 = 9q versus Hi:9 fi9o given by (2.4) 

and kn given by (2.3). Suppose that F is continuous and has a symmetric 
(around 9) and unimodal density. Then: 
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1. The 6-consistency distance for the sequence of tests ^Pn,Ooi ^ is 

2. The power breakdown point of the sequence of tests n> no, is 6* = 

(l-e)/2. 

3. The sequence of tests p>n,eo^ ^ das e-robust power if and only if 

£<1/3. 

4. Possible extensions and further research. Robust nonparametric con¬ 
fidence intervals and tests for a location parameter could be defined using 
other rank statistics such as the signed Wilcoxon test statistics. In this case 
the parameter of interest would be defined as the center of symmetry of 
the target distribution, and, therefore, the target distribntion (but not the 
observed distribution) would need to be symmetric. The main theoretical 
problem, which we were not able to solve, is the derivation of the least fa¬ 
vorable distribution that gives the minimum coverage. We conjecture that 
this distribution is the one that puts all its mass at -|-oo or at — oo. 

We are currently studying possible extensions of our procedure to the case 
of two samples and to the case of simple linear regression. For the two-sample 
problem, we wish to construct robust nonparametric confidence intervals for 
the shift parameter, based on the two sample median test statistic. For the 
simple linear regression problem, we wish to construct robust nonparametric 
confidence intervals for the slope parameter, based on the Brown and Mood 
(1951) test statistic, which is a natural extension of the sign test statistic. 

APPENDIX 

Lemma 1 is needed to prove Theorem 1. The proof of this lemma can be 
found as Lemma 4 of Yohai and Zamar (2004). 

Lemma 1. Suppose that X is Bin(n,p) and let 

n—k / \ 

Kv) = Y.\^v\\-vT-d 

Then (i) h{p) = h{l —p), (ii) h{p) is nondecreasing on 0<p< 1/2 for all 
k = 0,l,...,[n/2]. 

Proof of Theorem 1. We have 

PG{X{k+l) <^< X(n-k)) = Polk < TnfiiXn) < U - k} 

= P{k < Zn <n — k), 
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where is distributed as Binomial{n, 1 — G(0)}. On the other hand, G(0) = 
(1 — e)F{6) + eH(9) and so 

= (1 - e)Fie) < G{e) < (1 - e)F{e) + e = 

Therefore, for all G G Fs{F), (1 — e)/2<l — G{9) < (1 + e)/2 with the lower 
and upper bounds attained when H{9) concentrates all its mass to the left 
and right of 6, respectively. The theorem now follows from Lemma 1. □ 


The following lemma is needed to prove Theorem 2. For a proof of Lemma 2 
see Lemma 5 in Yohai and Zamar (2004). 


Lemma 2. Let Xn = (xi,... ,Xn) be i.i.d. random variables with distribu¬ 
tion G. Consider the sequence of intervals In{Xn) = [x(kn-i-i)^^(n-k„)) 
lengths ln{Xn) = X(^n-kn) ~ ^(fcn+1) levels a*{n,kn,e) —> a, 0 < a < 1. 
Then lim™/(^n) = - G-\^) = L{G,s). 


Proof of Theorem 2. Put L*{G,e) = G ^{(1 + e)/2} - G ^{(1 - 
e)/2}. By Lemma 2, to prove part 1 it is enough to show 


(A.l) sup 

e.EMF') l2(l-o) 


Ge^siF) 

We start by showing that 


1 — e 

\W^) 


(A.2) sup L*{G,£)<F 

GdTsiF) 

Let G = {l-6)F + 6H. Then 




1 + £ 


12(1-<5) 




1 — e 


ai = G 


-1 


1 — £ 


a2 = G 


l2(l-<5)/- 
_1 /1 + £ 


03 


= F-^l 


1 — £ 
l2(l-(^) 


= 0 , 




04 = F 


12(1-5) 


We will show first that 


(A.3) F(a2)-F(ai)<F(a4)-F(a3). 

This follows because by definition of quantiles, 

e = G{a 2 ) - G{ai) = (1 - 5)F{a2) + 5H{a2) - (1 - 5)F{ai) - 5H{ai) 
= (1 - 5){F{a2) - F{ai)} + 5{H{a2) - H{ai)} 

>(l-5){F(a2)-F(ai)}, 
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and, therefore, 

(A.4) 

On the other hand, 

(A.5) F(o4) - Fias) = 


F{a 2 ) - F{ai) < 


1+6 


1-5 


1 — e 


1-5 


2{l-5) 2(1-5) 

Therefore, (A.3) follows from (A.4) and (A.5). To complete the proof of (A.2), 
we consider two cases: 

Case 1 {5 <£). First notice that; 

(i) 1/2 > (1 — e)/{2(l — 5)} implies that 

' 1 ' 


0 = F 


-1 


>F 




(ii) (1 — e)/2 = F{ai) > (1 — 5)F{ai) implies that 


By (A.3), 
(A.6) 


F(a4) - F{a 2 ) > ^(03) - F(oi). 


Given the symmetry and unimodality of F, (A.2) follows from (A.6) if we 
can show that 

(A.7) 02 > - 03 - 

To prove (A.7), we hrst notice the identity 

(e — 5) 1 1 — e 1 + e — 25 1 


(A.8) 


2(1-5) 


2(1-5) 2 2(1-5) 

Symmetry of F and (A.8) imply 


(A.9) 


12(1-5) 


2(1-5) 

Moreover, (1 + e)/2 = G{a 2 ) < (1 — 5)F{a2) + 5 implies 

r 1 + £ — 25 


= - 03 - 


(A.IO) 


02 > F 


2(1-5) 


Equation (A.7) follows now from (A.9) and (A.IO). 

Case 2 (5 > e). Since in this case 1/2 < (1 — e)/{2(l — 5)}, we have 


(ATI) 


0 = F 


-1 


<F 


- 4 - 


I — £ 


12(1-5) 


= 03 . 
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Moreover, (1 — e)/2 = G{ai) < (1 — (5)F(ai) + 6 implies 


(A.12) 

We have the identity 
(A.13) 


ai> F 


-1 


l-e-26 


2 ( 1 -<5) 

l-s-26 


1 + S 


2(1-(5) 2 2(1-<5) 2(1-(5) 2‘ 

Equations (A.12) and (A.13) give 


(A.14) 


ai > F 


jr-lf 1 + ^ 


2(1-5) 


12(1-5) 


= — 04 . 


The inequality (1 — s)/2 = G{ai) > (1 — 5)F(ai) implies 


(A.15) 


0-1 < F 


12(1-5) 


= 03 . 


Equations (A.14) and (A.15) give 

(A.16) — 04 <«!< Os- 


Then (A. 2 ) follows now from (A. 16) and the unimodality and the symmetry 
of F. 

Let dm be the point mass distribution at m. Then 


hm L*{(l-e)F + e5^,e} = F-i(J^ 
m^oo L2(l —5) 


-F -1 


2(1 - 5) J 


This together with (A. 2 ) implies (A.l). The proofs of parts 2 and 3 are 
straightforward. To prove part 2 just notice that the maximum interval 
length is finite provided that (1 + e)/{2(l — 5)} < 1. Part 3 follows immedi¬ 
ately from part 2 . 

Finally, to prove part 4, let Gq be defined by 


( 0 , 

Go{x) = I Fjx) - e 
[ l-e 


if X < F ^(e), 

if X > F“^(e), 


and F[ he defined by 


H{x) 


F(x) 

e 

1 , 


if X < F ^(e), 
if X > F“^(e). 


Then observe that F = (1 — e)Go + sF[, and, therefore, F G Fe(Go). Conse¬ 
quently, Gq ^(1/2) = F“^((l-|-e)/2) G [AojFq] and, therefore, Bq > F“^((l-|- 
e)/2). 
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Put 


( Fjx) 

Go{x) = 1 1 — e’ 


if X < F ^(1 — e), 
if X > F“^(l — e), 


H{x) 


£ 


if X < F ^(1 —e), 

if X > F“^(l — e). 


We also have that F = (1 — £)Gq + eH, and, therefore, F G Fe(Go). Then 
Gq^( 1/2) = F“^((l — e)/2) G [aoj&o] and, therefore, 

□ 


Proof of Theorem 3. We can assume without loss of generality that 
00 = 0- We start by showing that given any F, we have 

1, ifG-i{(l-£)/2}>0, 

(A.17) Im FG(¥^n,o = 1) = <( 0, if G-H(l - e)/2} < 0 < G-H(l + e)/2}, 

1, if G-H(l + e)/2} <0. 


We have 
(A.18) 


PaWnfliXu ) = 1} = Fg{ 0 ^ [x(fc„),X(„_fe^))}. 


In Lemma 2 we have shown that X(;j^) —> G ^{(1 — e)/2} and Xf^n-k„) 
G“^{(1 + e)/2}. Therefore, (A.17) follows from (A.18). Then 

inf lim PcWn,o{Xn) = 1} = 1 for all |A| > K 

GG.?a(FA) 


holds either if 


_i 7 1 + £ 


G£J^s{Fx) 


(A.19) sup G ^ " n " ) = A + sup G —^)<0 


_i 71 T £ 


G£Fs{F) 


or 


(A.20) inf G"^, 

G&Fs{Fx) \ 2 


1 — £ 


= A+ inf G-\ ^ 

G&Fs{F) V 2 


1 — £ 


> 0 . 


As in Theorem 2, we can show that 


sup G 

GeFsiF) 




l2(l-<5) 


and 
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In order for either (A.19) or (A.20) to hold, it is required that 


|A|>F-i 


1 + e ) 
2 ( 1 ^/’ 


proving part 1 of the theorem. The proofs of parts 2 and 3 are straightfor¬ 
ward. □ 
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